Plan for week 14-18 November 2022

Monday 14/11

09:00-09:45 Introduction and set-up

10:00-11:30 Visualization and mapping with coordinate reference systems

11:30-12:00 Discussion

13:00-14:00 Visualization and mapping with coordinate reference systems (exercises)

14:00-14:15 #30DayMapChallenge https://30daymapchallenge.com/ https://fosstodon.org/web/tags/30DayMapChallenge https://twitter.com/search?q=%252330DayMapChallenge

14:30-15:30 Finding systematic patterns in variable/geographic space

Tuesday 15/11

09:00-09:45 Jakub Nowosad (zoom) Q&A Finding systematic patterns in geographic space

10:00-11:00 Finding systematic patterns in variable/geographic space (exercises)

11:00-12:00 External libraries, links to GRASS GIS, raster/terra

13:00-13:30 Geodemographics

13:30-14:30 Links to GIS, raster/terra, geodemographics (exercises)

14:45-15:45 Spatio-temporal cubes, juridicial change

Wednesday 16/11

09:00-10:00 Spatio-temporal cubes, juridicial change (exercises)

10:15-11:15 Global and local spatial autocorrelation

11:15-12:00 Data puzzles (support)

13:00-14:15 Global and local spatial autocorrelation (exercises)

14:30-15:30 Residual autocorrelation (exercises)

Thursday 17/11

09:00-10:15 Spatial autocorrelation and machine learning

10:30-11:30 Point pattern analysis

11:30-12:00 Point pattern analysis (exercises)

13:00-14:30 Transport, networks, transmission

14:45-15:45 Transport, networks, transmission (exercises)

Friday 18/11

Depending on the number of participants wishing to present project outlines (about 25 minutes, 10-15 minutes presentation, 10-15 minutes discussion, the number of slots may vary.

09:00-12:00 Presentation of up to six project outlines, with 20 min. break.

12:30-15:30 Presentation of up to six project outlines, with 20 min. break.

15:30-16:00 Round-up and feedback

Introduction and set-up

Auditorium M is larger than the room initially assigned, but has been upgraded recently. We’ll find out practically how to get into this room (Door B, up stairs, turn left); in principle all teaching rooms are access-card-only, so those of us with will need to help others; perhaps Canvas messages could be used to contact others if you get stuck outside a locked door (the Canvas app can be set up to issue notifications when a message is received). Because anyone with a card can access the room, the room cannot be physically locked, so either we can agree among ourselves on who might stay in the room for each of two halves of the lunch break to avoid rigging down laptops. We’ll have to learn how to live with this, upgraded rooms have both positive and negative advantages.

We can probably stay in one part of the room; as far as I understand, the active camera is that facing the front. Microphones need to be used to be heard when streaming is activated. The streaming times are set at 09:00-12:00 and 13:00-15:45 Monday to Thursday, and 09:00-12:00 and 12:30-16:00 on Friday. Streamed content should be available in the Panopto Video tab in Canvas. It is now only available to those registered on this course and logged in. Raw recordings will be published successively in the same place with little delay for those who are at an awkward time-zone offset. Subsequently, edited recordings cutting out periods with no relevant content will replace the raw recordings. The recordings should show a view of the front of the room, beamer content, and sound captured by microphones.

Those who are not on-site will have to use Canvas messages to interact with us; not ideal but the video system has no two-way option. We are (at the time of writing) so close to on-site delivery, nothing has happened (so far) to force a reversion to hybrid, so delivery will be on-site plus streaming. Should anyone needing to present on Friday be hindered, we can fall back on a zoom session for everybody early the following week if the need arises.

The local weather forecast may be found at https://www.yr.no/en/forecast/daily-table/1-92388/Norway/Vestland/Bergen/Sandviken. Since working in auditorium M may be a bit constricting, we may break out of the schedule if the sun comes out (or even if it doesn’t), and for example walk into town (or part of the way) from mid-afternoon to open up for more informal discussions. For those from outside Bergen, the 7-day public transport ticket is convenient, for example using the Skyss app, see https://www.skyss.no/en for information.

Wifi internet is available throughout the NHH campus. If you already use eduroam, you may have been connected automatically through your home institution. If not, or if your eduroam connection does not work here, please consult https://www.nhh.no/en/about-nhh/it-support/it-support-for-guests/; unfortunately identity confirmation requires a Norwegian phone number, so please assist each other in getting online.

Visualization and mapping with coordinate reference systems

Visualization is one of the key strengths of R. However, visualization involves many choices, and R offers such a wide range of choices that guidance is desirable. There are two major underlying techologies, base graphics and grid graphics, and several toolboxes built on these (trellis graphics and grammar of graphics on grid graphics), in addition to JavaScript widgets for interaction. In addition, much work has been done on the effective use of shapes and colours https://colorbrewer2.org, https://hclwizard.org/ (and https://www.nature.com/articles/nmeth.1618). See also the most recent AltText movement: https://webaim.org/techniques/alttext/ https://support.microsoft.com/en-us/office/everything-you-need-to-know-to-write-effective-alt-text-df98f884-ca3d-456c-807b-1a1fa82f5dc2

We can distinguish between presentation graphics and analytical graphics. Presentation graphics are intended for others and are completed (even if interactive). Analytical graphics may evolve into presentation graphics, but their main purpose is to visualize the data being analysed (see Antony Unwin’s book and (http://www.gradaanwr.net/), and Claus Wilke’s https://clauswilke.com/dataviz/. Many of the researchers who have developed approaches to visualization have been involved with Bell Labs, where S came from https://priceonomics.com/how-william-cleveland-turned-data-visualization/.

As installed, R provides two graphics approaches, one known as base graphics, the other trellis or lattice graphics. Most types of visualization are available for both, but lattice graphics were conceived to handle conditioning, for example to generate matching plots for different categories. Many of these were based on the data-ink ratio, favouring graphics with little or no extraneous detail (Edward Tufte) - see Lukasz Piwek’s blog http://motioninsocial.com/tufte/. There are other approaches, such as Leland Wilkinson’s Grammar of Graphics, implemented in Hadley Wickham’s ggplot2 package, which we will also be using here.

So there are presentation and analytical graphics, and there can be a number of approaches to how best to communicate information in either of those modes. R can create excellent presentation graphics, or provide input for graphics designers to improve for print or other media. What we need now are the most relevant simple analytical graphics for the kinds of data we use.

Histograms and many other kinds of chart require user choices about the number of bins to be used and bin/class intervals. This may also be termed quantization: the division of part of the real line on which we have measured a variable into intervals. This can also apply to combined categories if they are recoded to reduce the number of alternatives to be displayed. Class intervals are much used in thematic cartography, and I’m the author of the classInt package.

Class intervals can be chosen in many ways, and some have been collected for convenience in the classInt package. The first problem is to assign class boundaries to values in a single dimension, for which many classification techniques may be used, including pretty, quantile, natural breaks among others, or even simple fixed values. From there, the intervals can be used to generate colours from a colour palette, using the very nice colorRampPalette() function. Because there are potentially many alternative class memberships even for a given number of classes, choosing a communicative set matters.

We may choose the number of intervals ourselves arbitrarily or after examination of the data, or use provided functions, such as nclass.Sturges(), nclass.scott() or nclass.FD(). In hist(), nclass.Sturges() is used by default. We can also split on sign(), but handling diverging intervals often involves more work.

The default intervals for bins in hist() are pretty(range(x), n = breaks, min.n = 1), where breaks <- nclass.Sturges(x). The function computes a sequence of about n+1 equally spaced ‘round’ values which cover the range of the values in x. The values are chosen like values of coins or banknotes (1, 2, 5, etc.).

If we use the classIntervals() function from classInt, we can pass through arguments to the function called through style=, and note that n will not necessarily be the number of output classes. By default, intervalClosure= is "left", so [-30, -20) means numbers greater than and equal to (>=) -30 and less than (<) -20; [10, 20] is numbers >= 10 and <= 20.

The tmap https://r-tmap.github.io/tmap/ package provides cartographically informed, grammar of graphics (gg) based functionality now, like ggplot2 using grid graphics. John McIntosh tried with ggplot2 https://www.johnmackintosh.net/blog/2017-08-22-simply-mapping/, with quite nice results. I suggested he look at tmap, and things got better https://www.johnmackintosh.net/blog/2017-09-01-easy-maps-with-tmap/, because tmap can switch between interactive and static viewing. tmap also provides direct access to classInt class intervals. Like the sf::plot() method, tmap plotting can use classInt internally and accepts a palette (try looking at tmaptools::palette_explorer() for ColorBrewer palettes, or at the rcartocolor package).

The underlying logic of conditioned graphics is that multiple displays (windows, panes) use the same scales and representational elements for comparison. Using the same scales and representational elements for comparison can be done manually, imposing the same scales, colours and shapes in each plot and laying the plots out in a grid. Trellis graphics automated this in S, and lattice provides similar but enhanced facilities in R with a formula interface. ggplot2 and other packages also provide similar functionalities.

In early R, all the graphics functionality was in base; graphics was split out of base in 1.9.0 (2004-04-12), and grDevices in 2.0.0 (2004-10-04). When R starts now, the graphics and grDevices packages are loaded and attached, ready for use. graphics provides the user-level graphical functions and methods, especially the most used plot() methods that many other packages extend. grDevices provides lower-level interfaces to graphics devices, some of which create files and others display windows. The capabilities() function shows what R itself can offer, including non-graphics capabilities, and we can also check the versions of external software used:

capabilities()
##        jpeg         png        tiff       tcltk         X11        aqua 
##        TRUE        TRUE        TRUE        TRUE        TRUE       FALSE 
##    http/ftp     sockets      libxml        fifo      cledit       iconv 
##        TRUE        TRUE       FALSE        TRUE       FALSE        TRUE 
##         NLS       Rprof     profmem       cairo         ICU long.double 
##        TRUE        TRUE       FALSE        TRUE        TRUE        TRUE 
##     libcurl 
##        TRUE
grSoftVersion()
##                    cairo                  cairoFT                    pango 
##                 "1.17.6"                       ""                 "1.50.9" 
##                   libpng                     jpeg                  libtiff 
##                 "1.6.37"                    "6.2" "LIBTIFF, Version 4.4.0"

Most of base graphics is vector graphics, but some innovations apply both to base and grid. The gridBase package permits base graphics elements, often created as plot() methods in contributed packages, to be placed in grid graphics displays.

We can check the relative standing of graphics and grid from the CRAN package database, and add lattice and ggplot2 (Suggests are typically in examples):

db <- tools::CRAN_package_db()
types <- c("Depends", "Imports", "Suggests")
pkgs <- c("graphics", "grid", "lattice", "ggplot2")
(tbl <- sapply(types, function(type) sapply(pkgs, 
  function(pkg) length(db[grep(pkg, db[, type]), 1]))))
##          Depends Imports Suggests
## graphics     289    2084       90
## grid          80     800      299
## lattice      117     263      267
## ggplot2      370    2693     1203
class(tbl)
## [1] "matrix" "array"
library(lattice)
barchart(tbl, auto.key=TRUE, horizontal=FALSE)

The grid and lattice entered as Recommended in R 1.5.0 in April 2002, and grid became a base package in 1.8.0 in October 2003. Some changes were made in grid for R 3, but its structure remains very stable. The gridBase and gridGraphics packages provide functions for capturing the state of the current device drawn with base graphics tools. One reason for this is the unsolved problem of testing graphics output for identity, to ensure that the same commands for the same data give the same output; for grid objects this is feasible, but not for base graphics on interactive devices. Over and above the use of grid directly, the general-purpose packages lattice and ggplot2 build on grDevices and grid. In addition, it is worth mentioning the vcd (visualizing categorical data) and vcdExtra packages and a recent book on http://ddar.datavis.ca/.

We can combine grob from different sources

b <- barchart(tbl, auto.key=TRUE,
  horizontal=FALSE)
x11()
barplot(t(tbl), legend.text=TRUE,
  args.legend=list(x="top", bty="n",
  cex=0.8, y.intersp=3))

gridGraphics::grid.echo()

library(grid)
g <- grid.grab()

dev.off()
## png 
##   2
grid.newpage()
gridExtra::grid.arrange(g, b, ncol=2)

grid pushes viewports onto a stack, then pops them, see R Graphics and vignettes

grid.rect(gp = gpar(lty = "dashed"))
vp <- viewport(width = 0.5, height = 0.5)
pushViewport(vp)
grid.rect(gp = gpar(col = "grey"))
grid.text("quarter of the page", y = 0.85)
pushViewport(vp)
grid.rect()
grid.text("quarter of the\nprevious viewport")
popViewport(2)

Just reading the print method for ggplot objects shows how close grid is under ggplot2.

ggplot2:::print.ggplot
## function (x, newpage = is.null(vp), vp = NULL, ...) 
## {
##     set_last_plot(x)
##     if (newpage) 
##         grid.newpage()
##     grDevices::recordGraphics(requireNamespace("ggplot2", quietly = TRUE), 
##         list(), getNamespace("ggplot2"))
##     data <- ggplot_build(x)
##     gtable <- ggplot_gtable(data)
##     if (is.null(vp)) {
##         grid.draw(gtable)
##     }
##     else {
##         if (is.character(vp)) 
##             seekViewport(vp)
##         else pushViewport(vp)
##         grid.draw(gtable)
##         upViewport()
##     }
##     if (isTRUE(getOption("BrailleR.VI")) && rlang::is_installed("BrailleR")) {
##         print(asNamespace("BrailleR")$VI(x))
##     }
##     invisible(x)
## }
## <bytecode: 0x5f933d8>
## <environment: namespace:ggplot2>

There are various lower and higher-level ways of combining graphical output: some are described in https://cran.r-project.org/web/packages/egg/vignettes/Ecosystem.html in the egg package

df <- cbind(expand.grid(pkgs, types), n=c(tbl))
library(ggplot2)
gg <- ggplot(df, aes(x=Var1, y=n, fill=Var2)) + geom_col() + 
  xlab("") + guides(fill=guide_legend(title="")) +
  theme(legend.position="top")
gg2 <- ggplotGrob(gg)
t <- gridExtra::tableGrob(as.data.frame(tbl))
gridExtra::grid.arrange(g, b, gg2, t, ncol=2, nrow=2)

### Coordinate reference systems

https://geocompr.robinlovelace.net/spatial-class.html#crs-intro https://geocompr.robinlovelace.net/reproj-geo-data.html https://clauswilke.com/dataviz/geospatial-data.html#projections https://r-spatial.org/book/02-Spaces.html https://riatelab.github.io/bertin/ https://observablehq.com/@neocartocnrs/hello-bertin-js https://observablehq.com/collection/@neocartocnrs/bertin https://observablehq.com/@neocartocnrs/bertin-js-projections?collection=@neocartocnrs/bertin https://observablehq.com/collection/@d3/d3-geo-projection

Mapping

library(sf)
## Linking to GEOS 3.11.0, GDAL 3.6.0, PROJ 9.1.0; sf_use_s2() is TRUE
library(sf)
chicago_tracts <- st_read("chicago_tracts.gpkg")
## Reading layer `chicago_tracts' from data source 
##   `/home/rsb/und/ecs530/ECS530_h22/chicago_tracts.gpkg' using driver `GPKG'
## Simple feature collection with 2195 features and 28 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.94229 ymin: 40.73651 xmax: -86.92936 ymax: 42.66976
## Geodetic CRS:  NAD83

We’ll use the tmap package with class interval styles from the classInt package:

library(tmap)
tm_shape(chicago_tracts) + tm_fill("med_inc_cv", style="fisher", n=7, title="Coefficient of Variation")

In ESRI documentation, CV thresholds of 12 and 40 percent are proposed for the transformed reported MOE values: https://doc.arcgis.com/en/esri-demographics/data/acs.htm. We’ll create a classified variable (ordered factor):

chicago_tracts$mi_cv_esri <- cut(chicago_tracts$med_inc_cv, c(0, 0.12, 0.40, Inf), labels=c("High", "Medium", "Low"), right=TRUE, include.lowest=TRUE, ordered_result=TRUE)
table(chicago_tracts$mi_cv_esri)
## 
##   High Medium    Low 
##   1400    766     29

and map it:

tm_shape(chicago_tracts) + tm_fill("mi_cv_esri", title="Reliability")

As the Low reliability tracts are small in size, the "view" mode for interactive mapping may help:

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(chicago_tracts) + tm_fill("mi_cv_esri", title="Reliability")
tmap_mode("plot")
## tmap mode set to plotting

Or equivalently using mapview, plotting the CV values rather than the ordered factor, which is not yet well-supported:

library(mapview)
mapviewOptions(fgb = FALSE)
mapview(chicago_tracts[,"med_inc_cv"], layer.name="Coefficient of Variation")

The RColorBrewer package gives by permission access to the ColorBrewer palettes accesible from the ColorBrewer website. Note that ColorBrewer limits the number of classes tightly, only 3–9 sequential classes

We can also display all the ColorBrewer palettes:

library(RColorBrewer)
display.brewer.all()

Try exploring alternative class interval definitions and palettes, maybe also visiting http://hclwizard.org/ and its hclwizard() Shiny app, returning a palette generating function on clicking the “Return to R” button:

library(colorspace)
hcl_palettes("sequential (single-hue)", n = 7, plot = TRUE)

pal <- hclwizard()
pal(6)

The end of rainbow discussion is informative:

wheel <- function(col, radius = 1, ...)
  pie(rep(1, length(col)), col = col, radius = radius, ...) 
opar <- par(mfrow=c(1,2))
wheel(rainbow_hcl(12))
wheel(rainbow(12))

par(opar)

https://clauswilke.com/dataviz/geospatial-data.html https://r-spatial.org/book/08-Plotting.html https://geocompr.robinlovelace.net/adv-map.html https://walker-data.com/census-r/mapping-census-data-with-r.html https://rspatial.org/terra/index.html https://rspatial.org/terra/rosu/index.html https://riatelab.github.io/mapsf/